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Abstract: When targeting a distribution that is artificially invariant under some permu- 
tations, Markov chain Monte Carlo (MCMC) algorithms face the label- switching problem, 
rendering marginal inference particularly cumbersome. Such a situation arises, for exam- 
ple, in the Bayesian analysis of finite mixture models. Adaptive MCMC algorithms such as 
adaptive Metropolis (AM), which self-calibrates its proposal distribution using an online 
estimate of the covariance matrix of the target, are no exception. To address the label- 
switching issue, relabeling algorithms associate a permutation to each MCMC sample, 
trying to obtain reasonable marginals. In the case of adaptive Metropolis [15], an online 
relabeling strategy is required. This paper is devoted to the AMOR, algorithm, a provably 
consistent variant of AM that can cope with the label-switching problem. The idea is to 
nest relabeling steps within the MCMC algorithm based on the estimation of a single co- 
variance matrix that is used both for adapting the covariance of the proposal distribution 
in the Metropolis algorithm step and for online relabeling. We compare the behavior of 
AMOR to similar relabeling methods. In the case of compactly supported target distribu- 
tions, we prove a strong law of large numbers for AMOR and its ergodicity. These are the 
first results on the consistency of an online relabeling algorithm to our knowledge. The 
proof underlines latent relations between relabeling and vector quantization. 
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approximation; vector quantization 
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1. Introduction 

Markov chain Monte Carlo (MCMC) is a generic approach for exploring complex probability 
distributions based on sampling [24] . It has become the de facto standard tool in many appli- 
cations of Bayesian inference. However, a very common situation in which MCMC algorithms 
face serious difficulties is when the target posterior distribution is known to be invariant under 
some permutations (or block permutations) of the variables. In that case, the difficulties are 
both computational, as most often the MCMC algorithm fails to validly visit all the modes of 
the posterior, and inferential, in particular rendering marginal posterior inference about the in- 
dividual variables particularly cumbersome [10]. In the literature, this latter difficulty is usually 
referred to as the label switching problem [32] . The most well-known example of this situation is 
when performing Bayesian inference in a mixture model. In this case the mixture likelihood is 
invariant to permuting the mixture components and, most often, the prior itself does not favor 
any specific ordering of the mixture components [9, 32, 17, 18, 22, 31, 19]. Another important 
example arises in signal processing with additive decomposition models. In this case, the ob- 
served signal is represented as the superposition of exchangeable signals, and the main goal is to 
recover the individual signals or their parameters. In addition, often the number of signals also 
has to be determined [30, 29, 6]. It was observed empirically that when the dimension of the 
model is not known, the reversible jump sampler [23] makes it easier to visit the multiple modes 
corresponding to the permutations but, of course, marginal inference becomes harder due to the 
additional difficulty of associating components between models of varying dimension. 

In this contribution, we address the label switching problem in the generic case where no 
useful external information on the target is known. This corresponds, for instance, to a posterior 
distribution when neither the likelihood is assumed to have a specific form, nor the prior is chosen 
to have conjugacy properties, which forbids the use of Gibbs sampling or other specialized 
sampling strategies. We assume, however, that the target is known to be invariant under some 
permutations of the parameters. This framework is typical, for instance, in experimental physics 
applications where the likelihood computation is commonly deferred to a black-box numerical 
code. In those cases, one cannot assume anything about the structure of the posterior or its 
conditional distributions, except that they should be invariant to some permutations of the 
parameters. We also restrict ourselves to the case where the dimension of the model is finite and 
known so the parameters of the model are M'^-valued for some fixed and finite d. 

Adaptive MCMC algorithms can self-calibrate their internal parameters along the iterations 
in order to reach decent performance without (or with almost no) knowledge about the target 
distribution, eliminating the grueling step of tuning the proposals. Adaptive MCMC has been 
an active field of research in the last ten years, following the pioneering contribution of [15] — 
see [3] as well as the other papers in the same special issue of Statistics and Computing, along 
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with [4, 2, 28]. Adaptive Metropolis (hereafter AM; [15]) and its variants aim at identifying 
the unknown covariance structure of the target distribution along the run of a random walk 
Metropolis-Hastings algorithm with a multivariate Gaussian proposal. The rationale behind 
this approach is based on scaling results which suggest that, when d tends to +00, the chain 
correlation is minimized when the covariance matrix used in the proposal distribution matches, 
up to a constant that depends on the dimension, the covariance matrix of the target, for a large 
class of unimodal target distributions with independent marginals [25, 26]. AM thus progressively 
adapts, using a stochastic approximation scheme, the covariance of the proposal distribution to 
the estimated covariance of the target. 

It has been empirically observed in [5], and we provide further evidence of this fact below 
in Section 2.2, that the efficiency of AM can be greatly impaired when label switching occurs. 
The reason for such a difficulty is obvious: if label switching occurs, the estimated covariance 
matrix no longer corresponds to the local shape of the modes of the posterior and so the ex- 
ploration can be far from optimal. In Section 2.2, we also provide some empirical evidence that 
off-the-shelf solutions to the label-switching problem, such as imposing identifiability constraints 
or post-processing the simulated sample, are not fully satisfactory. A key difficulty here is that 
most of the approaches proposed in the literature are based on post-processing of the simulated 
trajectories after the MCMC algorithm has been fully run [32, 17, 18, 22, 31, 19, 30]. Unfortu- 
nately, in the case of adaptive MCMC, post-processing cannot solve the improper exploration 
issue described above. On the other hand, online relabeling algorithms [23, 10, 11] often require 
manual tuning based on, for example, prior knowledge on the location of the redundant modes 
of the target. Without such manual tuning they often yield poor samplers, as we will show it in 
Section 2.2. 

Our main purpose in this paper is to provide a provably consistent variant of AM that can 
cope with the label-switching problem. In [5], we proposed an adaptive Metropolis algorithm 
with online relabeling, called AMOR, based on the original idea of [9]. The idea is to nest 
relabeling steps within the MCMC algorithm based on the estimation of a single covariance 
matrix that is used both for adapting the covariance of the proposal distribution used in the 
Metropolis algorithm step and for online relabeling. Contrary to [9] , the AMOR algorithm also 
corrects for the relabelings using a modified acceptance ratio. 

In Section 2.2, we provide empirical evidence that the coupling established in AMOR between 
the criterion used for relabeling and the estimation of the covariance of the local modes of 
the posterior is beneficial to avoid the distortion of the marginal distributions. Furthermore, 
the example considered in Section 2.2 also demonstrates that the AMOR algorithm samples 
from non-trivial identifiable restrictions of the posterior distribution, that is, truncations of the 
posterior on regions where the posterior marginals are distinct but from which the complete 
posterior can be recovered by permutation. The study of the convergence of AMOR in Section 3 
reveals an interesting connection with the problem of optimal probabilistic quantization [13] 
which was implicit in earlier works on label switching. It was observed previously by [21] that 
some adjustments to the usual theory of stochastic approximation are necessary to analyze online 
optimal quantification due to the presence of points where the mean field of the algorithm is not 
differentiable. To circumvent this difficulty, we introduce the stable AMOR algorithm, a novel 
variant of the AMOR algorithm that avoids these problematic points of the parameter space. 
Finally, we establish consistency results for the stable AMOR algorithm, showing that it indeed 
asymptotically provides samples distributed under a suitably defined restriction of the posterior 
distribution in which the parameters are marginally identifiable. 
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The paper is organized as follows. In Section 2, we describe the AMOR algorithm and com- 
pare it with alternative approaches on an illustrative example. In Section 3, we address the 
convergence of the algorithm. The detailed proofs are provided in Appendix. 

2. The AMOR algorithm 

In this section, we briefly review the AMOR algorithm and illustrate its performance on an 
artificial example. 

2.1. The algorithm 

Let TT be a density with respect to (w.r.t.) the Lebesgue measure on M'' which is invariant to 
the action of a group V of matrices, that is, 

Vx e M"*, VP e V, Tr{x) = tt{Px) . 

Denote by the set oi d x d real positive definite matrices. For /i G US'* and S £ Cj", define 
Le -.R'^ ^ M+ by 

Le{x) ^ {x - tifj:-\x ~ ^l) , (2.1) 

and let Af{-\fj,, S) denote the Gaussian density with mean /i and covariancc matrix E. Algorithm 1 
describes the pseudocode of AMOR [5] . 

Algorithm 1. 

AMOr(^(.), Xo, T, eo - iflo, So), c, ht)t>o) 



1 5^0 

2 for t 1 to T 

3 S -S— cEt-i > scaled adaptive covariance 

4 X r^Af[- \Xt-i, S) i> proposal 

5 P ^ argminLgj j (PX) > pick an optimal permutation 

6 X ^ PX > permute the proposal 

if -'^''S'-"^(^^''-l-^'^> >»[0.11.he„ 
'(AVi)EpAf(PA-|-Y,-i,S) 

8 Xt X > accept 

9 else 

10 Xt 4- Xt-i reject 

11 S S U {Xt} t> update the posterior sample 

12 p.t ^ IJ-t-i + lt{Xt - ^lt-l) 

13 St ^ + 7t((Xt - ^t_i)(Xt - //t_i)T - St.i) 

14 0t^(Mt,St). 

15 return S 



To explain the proposal mechanism of AMOR, let ^t-i and Et_i denote the sample mean 
and the sample covariance matrix, respectively, at the end of iteration t — 1, and let 9t-i = 
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(/it_i, St_i). Let also S denote the MCMC sample at the end of iteration f — 1. At iteration 
t, a point X is first drawn from a Gaussian centered at the previous state Xt-i and with 
covariance cSt-i, where c implements the optimal scaling results in [25, 26] discussed in Section 1 
(Steps 3 and 4). Then in Steps 5 and 6, X is replaced by PX, where P is a uniform draw over 
the permutations in aigmiiip Lg^_^{PX) that minimize the relabeling criterion (2.1)^. This 
relabeling step makes the augmented sample S U {PX} look as Gaussian as possible among all 
augmented sets SU {PX}, P G V. Formally, it can be seen as a projection onto the Voronoi cell 
Vgj^j, where 

Vg ={xeX/ Lg{x) < LeiPx), yPeV}. (2.2) 

Then, in Steps 7 to 10, the candidate PX is accepted or rejected according to the usual 
Metropolis-Hastings rule. Finally, the sample mean and covariance are adapted according to 
a stochastic approximation scheme in Steps 12 to 14 and so {"ft) is a sequence of nonnegative 
steps, usually set according to a polynomial decay 7* /3 G (1/2, 1]. 

AMOR is a doubly adaptive MCMC algorithm since it is adaptive both in its proposal and 
relabeling mechanisms. This means that, besides the proposal distribution, its target also changes 
with the number of iterations. In Section 3 we will prove that, at each iteration t, AMOR 
implements a random walk Metropolis-Hastings kernel with stationary distribution irg oc tt 1 . 




-4 -2 2 4 -4 -2 2 4 

(a) n (b) TTsEED 



Fig 1. Panel 1(a) shows the target distribution w used in Section 2,2, obtained by symmetrizing the Gaussian 
fsEED shown in Panel 1(b). ttseed has mean (0,2) and covariance matrix with diagonal (16,1) and non-diagonal 
terms equal to —0.975. 



2.2. An illustrative example 

In this section, we consider an artificial target aimed at illustrating the gap in performance 
between the AMOR algorithm and other common approaches to the label switching problem, 

^Step 5 usually boils down to selecting the permutation P that minimizes Lg^ -^. In case of tics, however, P 
should be drawn uniformly over the set on which the minimum is achieved. 
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which are compatible with adaptive MCMC. Consider the two-dimensional pdf tt depicted in 
Figure 1(a), which satisfies 7t{x) = Tr{Px) for P E V, where 



The density tt is a mixture of two densities with equal weights obtained by superposing the 
Gaussian pdf ttseed represented in Figure 1 (b) with a symmetrized version of itself. This artificial 
target does not correspond to the posterior distribution in an actual inference problem. In 
particular, although tt itself is a mixture, it is not the posterior distribution of the parameters of 
any specific mixture model. Nevertheless, it is relevant because it is permutation invariant and 
the desired solution of the label switching problem is well-defined: we know that, under suitable 
relabeling, we can obtain univariate near-Gaussian marginals for both coordinates by recovering 
the marginals of the two-dimensional Gaussian ttseed in Figure 1(b). In spite of its simplicity, 
this example is challenging because the two marginals of ttseed have similar means (0 and 2) 
and one has large variance, which makes them hard to separate. Given the modest dimension 
of the problem, we fix the number of MCMC iterations to 20 000, of which 4000 are discarded 
as burn-in. For each algorithm, we assess the quality of the relabeling strategy by looking at 
the corresponding restriction tt' of the target tt, and we assess the efficiency of the sampling by 
plotting the autocorrelation function of each sample and comparing the sample histograms with 
the marginals of tt'. 

The results obtained when applying AM, without any relabeling, are shown in Figure 2. 
The marginal posteriors are sampled quite well (Figures 2(c) and 2(d)) and the covariance 
of the joint sample (indicated by a thick ellipse Figure 2(a)) is almost symmetric. This is not 
surprising: the joint distribution, although severely non-Gaussian, is unimodal, and the number 
of iterations is large enough for AM to explore both the original seed ttseed and its symmetric 
version by frequent label switching. On the other hand, the covariance of the joint distribution tt 
(Figure 1(a)) is broader than the covariance of the seed ttseed (Figure 1(b)). This results in poor 
adaptive proposals and slow mixing as indicated by the slight differences between the marginals 
and the sample marginals, and by the autocorrelation function of the first component of the 
sample in Figure 2(b). The reference (dashed line) is the autocorrelation function of an MCMC 
chain with optimal covariance (proportional to the covariance of the target) targeting the single 
Gaussian ttseed (Figure 1(b)). 

We now consider a modified version of AM with online relabeling obtained by simply ordering 
the variables, meaning that after each proposal x = {xi,X2), the components of the proposed 
point are permuted so that xi < X2- This strategy is known as imposing an identifiability 
constraint. It is known to perform badly when the constraint does not respect the topology of 
the target [19]. The results of this approach on our illustrative example are shown in Figure 3. 
The unshaded triangle in Figure 3 shows that this time the sample is restricted to a subregion 
of where the components are identifiable. Unfortunately, the marginals of tt restricted to the 
unshaded triangle in Figures 3(c) and 3(d) are even more highly skewed than the marginals 
of the full joint distribution tt. In addition, sampling from the restricted distribution tt' is not 
easier than before indicated by the autocorrelation function in Figure 3(b). 

Applying the ordering constraint after the full sample has been drawn with AM leads to 
similar results as shown in Figure 4. This shows that the problem lies with the relabeling 
criterion rather then with the online nature of the relabeling procedure. 

Next, we consider the approach introduced by Celeux in [9]. Celeux's algorithm builds on 
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Fig 2. Results of vanilla AM on the two-dimensional target w of Figure 1. The rest of the caption is the same 
for Figures 3 to 6. On Panel 2(a), level lines of tt are depicted in thin black lines; a thick ellipse centered at the 
empirical mean of the sample S indicates the set {x : {x — fiT)'^^^^(x — /it) = 1}, where is the sample 
covariance. When appropriate, the region of the space selected by (the last iteration of) the algorithm corresponds 
to the unshaded background while the region not selected is shaded. On Panel 2(b ), the autocorrelation function 
(ACF) of the first component of S is plotted as a solid line. The dashed line indicates the ACF obtained 
when sampling from the seed Gaussian ttseed of Figure 1(b) using a random walk Metropolis algorithm with an 
optimally tuned covariance matrix. Panels 2(c) and 2(d) display the histograms of the two marginal samples. 
The solid curves are the marginals of vr in this figure. In Figures 3 to 6, they are the marginals of n restricted 
to the unshaded region selected by the algorithms. 



a non-adaptive random-walk Metropolis, where online relabeling is performed in the following 
way: when a point x = {x^^\ a:^^-*) is proposed at time t, it is relabeled by 
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(c) (d) 
Fig 3. Results of AM with online ordering constraint. For details about the plots, see the caption of Figure 2. 

where /xt = /ip"*) is the empirical mean of the current sample xi-t = xi, . . . ,Xt and Dt is 

the diagonal matrix containing the empirical variances of the coordinates of xi-t on its diagonal. 
Formally, this relabeling rule is equivalent to Steps 6 and 7 of Algorithm 1, but with all non- 
diagonal elements of S equal to zero. The results of Celeux's algorithm are shown in Figure 
5. It is hard to determine precisely the formal target of the algorithm. In particular, given 
the non- isotropic shape of the target, we used a non-isotropic Gaussian proposal with diagonal 
covariance matrix, and while the preservation of the detailed balance condition then requires 
incorporating a term into the acceptance ratio to account for the relabeling, it is absent in this 
approach. It is still possible that the algorithm is approximately sampling from the restriction 
tt' of TT to this unshaded area in Figure 5 (which represents the relabeling rule implemented at 
the end of the run) in a certain sense. The histograms in Figures 5(c) and 5(d) are in agreement 
with the solid line marginals. Certainly, there are no formal guarantees that this should happen. 
On the other hand, in Section 3 we can prove the corresponding claim for the AMOR algorithm. 
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(c) (d) 

Fig 4. Results of AM with ordering constraint applied as post-processing. For details about the plots, see the 
caption of Figure 2. 

This relabeling strategy seems to recover ttseed better than the mere ordering of coordinates 
as suggested by the marginal plots in Figures 5(c) and 5(d) which are less skewed and now 
roughly centered at the correct values (0 and 2, respectively). However, using a diagonal co- 
variance Dt also generates some distortion which results in a severely non-Gaussian, bimodal 
marginal in Figure 5(c). Because of these imperfections and due to the uncorrelated proposal, 
the autocorrelation in Figure 5(b) indicates, again, a much less efficient sampling than in the 
case of an optimal Metropolis chain targeting ttseed. 

The significance of Celeux's algorithm is that its adaptive relabeling rule (2.3) makes it 
possible to resolve the permutation invariance problem in a non-trivial way which appears to 
be more adapted to the true geometry of the target. It is still not perfect, and, as suggested by 
[32], one should replace the diagonal covariance matrix in (2.3) by the full covariance matrix 
of the sample. However, [32] explored this idea only as a post-processing approach. A severe 
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Fig 5. Results of Celeux's algorithm. For details about the plots, see the caption of Figure 2. 

difficulty in this context is the computational cost: if T denotes the number of drawn samples 
and p is the number of permutations to which tt is invariant, the required post-processing is a 
combinatorial problem with possible relabelings. This eventually led [32] to consider a more 
tractable alternative instead. More importantly in our context, we have seen above (e.g., in 
Figure 2) that running an adaptive MCMC on the full permutation-invariant target may result 
in a poor mixing performance. To achieve both relevant relabeling and efficient adaptivity, the 
key idea of AMOR is to link the covariance of the proposal distribution and the covariance used 
for relabeling, which are proportional to each other in AMOR. 

Figure 6 displays the results obtained using AMOR on our running example. AMOR does 
separate in two regions that respect the topology of the target much more closely than the 
approaches examined previously. Figure 6(a) indicates that the relabeled target is as Gaus- 
sian as possible among all partitionings based on a quadratic criterion of the form (2.1). The 
marginal histograms in Figures 6(c) and 6(d) now look almost Gaussian. They closely match the 
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Fig 6. Results of AMOR. For details about the plots, see the caption of Figure 2. 



marginals of both the restricted distribution tt' and the seed distribution ttseed in Figure 1(b). 
Furthermore, the autocorrelation function of AMOR (Figure 6(b)) is as good as the reference 
autocorrelation function corresponding to an optimally tuned random walk Metropolis-Hastings 
algorithm targeting the seed Gaussian ttseed in Figure 1(b). This perfect adaptation is possi- 
ble because the sample covariance now matches the covariance of the target restricted to the 
unshaded region of the plane (Figure 6(a)). 

On this example, the AMOR algorithm thus automatically achieves, without any tuning, a 
satisfactory result that cannot be obtained with any of the methods examined previously. We are 
now ready to prove our main result which shows that, under suitable conditions, a stable version 
of AMOR indeed asymptotically samples from the target distribution restricted to a region on 
which the marginals are identifiable, and that the sample mean and covariance converge to the 
corresponding moments of the restricted target. 
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3. Convergence results 

AMOR can be cast into the family of adaptive MCMC algorithms in which the updating rule 
of the design parameter relies on a stochastic approximation scheme. Adaptive MCMC can 
be described as follows: given a family of transition kernels {Pg)ee0^ the algorithm produces 
a (X X 0)-valued process {{Xt,9t))t>o such that the conditional distribution of Xt given the 
past is given by the transition kernel Pdt-i- This algorithm is designed so that when t tends 
to infinity, the distribution of Xt converges to the invariant distribution of the kernel P^t- 
Convergence of such adaptive procedures was recently analyzed by [27, 12]. In particular, [27] 
provided sufficient conditions are in terms of the so-called containment condition and diminishing 
adaptation. Furthermore, [12] showed that when each transition kernel Pg has its own invariant 
distribution, a condition on the convergence of these distributions is also required. 

In Section 3.1, we will show that each transition kernel of AMOR has its own invariant 
distribution. Therefore, as a preliminary step for the convergence of AMOR, the stability and 
the convergence of the design parameter sequence (0t)t>o have to be established. Sufficient 
conditions for the convergence of stochastic approximation procedures rely on the existence of a 
(sufficiently regular) Lyapunov function on O, on the behavior of the mean field at the boundary 
of the parameter set O, and on the magnitude of the stepsize sequence (7t)t>o- For Algorithm 1, 
we were only able to design a Lyapunov function for which some boundaries of O are not 
repulsive [5]. Therefore, we introduce in this paper a stable AMOR algorithm (Algorithm 2), 
which differs from Algorithm 1 in the update rules 12 and 13. In particular, we add (i) a penalty 
in steps 12 and 13 to make the boundaries of O repulsive, and (ii) a. stabilization step to ensure 
that the sequence {9t)t>o is bounded. 

We prove the convergence of the stable AMOR algorithm under the condition that the support 
of TT is compact. 

Assumption 1. it is a density w.r.t. the Lehesgue measure on M.'^ , which is hounded and with 
compact support X, and which is invariant to permutations in the group V : 

Vx e X,VP e r,TT{Px) = tt{x) . 

The compacity assumption makes it simpler to analyze the limiting behavior of the algorithm. 
The proofs can be extended to a more general case by using the same tools as in [12] and [1, 
section 3]. These technical steps are out of the scope of this paper. 

This section is organized as follows. In Section 3.1, we first describe the stable AMOR algo- 
rithm, and we show that it is an adaptive MCMC algorithm. We then characterize the limiting 
behavior of the sequence {Ot)t>o in Section 3.2 and address a strong law of large numbers for the 
samples {Xt)t>o, as well as the ergodicity of the sampler. All proofs are given in the Appendix. 

3.1. A stable AMOR algorithm 

Set V* = V\ {Id} and 

e = {{fi, S) e M'' X C+ / VP e 7^*, S^V 7^ ^'^^ V} ■ (3-1) 

The set M'' x is endowed with the scalar product ((a. A), {b, B)) = a^b + Trace(yl^P). We 
will use the same notation ||.|| for the norm induced by this scalar product, for the Euclidean 
norm on W^, and for the norm \\A\\ = Tr{A^ A)^/"^ on d x d real matrices. 
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Denote by Sd the set oidxd symmetric real matrices; and for P E V , let Up — {I—P)^{I—P). 
Let a > be fixed and define : X x 6 — > M*^ x 5^ by 

H{x,e)^{H,Xx,e),H^{x,e)) (3.2) 



where 



H^{x,e) - {x- n){x- ^if 



Finally, for any 5 > 0, set 



/C5 = {(Ai,S)ee:^iji|J|(/-F)S-Vll >'5}. (3.3) 

Let {5q)q>f) be a decreasing positive sequence such that limg_j.oo (5g = and JCs^ is not empty; 
choose 6*0 = (^oi ^o) G ^<5o- Algorithm 2 describes the stable AMOR algorithm in pseudocode. 

Algorithm 2. 



STABLEAMOR(7r(-),Xo,T,6'o = (Aio,So),c, (7t)t>o,a, (^A-,)g>o) 

1 5^0 

2 ■)/) l> Projection counter 

3 for t -^^ 1 to T 

4 S -s— cEt-i > scaled adaptive covariance 

5 X 7V( • |Xt_i, E) > proposal 

6 P ~ argminLgj j (-P^) ^ pick an optimal permutation 

Pev 

7 X 4— PX > permute 

if -(^)^^^(^^--^'^'^)> ^[0,1] then 

9 -s— X > accept 

10 else 

11 4- > reject 

12 5 4— 5 U {A"(} i> update posterior sample 

13 ^it ^ fit-i +ltH^^_^{Xt,Ot-i) 

14 I]t^St_i+7tffs,_i(^t,^^t-i). 

15 if {tit,^t) i- f^s^. then 

16 (/if, St) 4— (/io, Eq) i> Project hack to JCgg 

17 + l ^ Increment projection counter 

18 0t^(/it,Et). 

19 return 5 



To prevent that the new value {^t, ^t) moves into the set {6 E Q : inf pgp. || (/ — P)I]^^/i|| — 
0}, we modify the updates of and S in Steps 13 and 14 (Steps 12 and 13 in Algorithm 1), and 
add a projection mechanism in Steps 15 to 17. 
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We now prove that stable AMOR is an adaptive MCMC algorithm. For any 6* G 8, define 
the transition kernel Pg on (X, X) by 



Pe{x,A)= / ae{x,y)qe{x,y) dy + lA{x) I [l- aeix.z)) qdix,z) dz , (3.4) 

JAnVa JVe 

where Vg is given by (2.2), 

ae[x,y) = 1 A r, (3.5) 

TT[x)qe{x,y) 

and 

ge(x,y) = ^ AA(P2/|a;,cE) . (3.6) 

For 6* G O, define also 

Tre = iT'lly,^ . (3.7) 

The following proposition shows that qg{x^ ■) is a density on Vg and, the distribution irg given 
by (3.7) is invariant for the transition kernel Pg. It also establishes that stable AMOR is an 
adaptive MCMC algorithm: given (Xt_i,0f_i), Xt is obtained by one iteration of a random 
walk Metropolis-Hastings algorithm with proposal qg^_-^ and invariant distribution tt^j j^. 

Proposition 3.1. Under Assumption 1, the following assertions hold: 

1. For any G O and a; G X, qg{x,y) dy — 1. 

2. For any G 8, irgPg — ng and for any x G Vg, Pg{x, Vg) = 1. 

3. Let {Ot, Xt)tyQ be given by Algorithm 2. Conditionally on a{XQ,9Q, Xi,9i, Xt_i,9t^i), 
the distribution of Xt is Pg^_-^{Xt^i, ■). 

Note that the proof of Proposition 3.1 is independent of the update scheme of {dt)t>Q, which 
makes the proposition valid for both Algorithms 1 and 2. 

3.2. Convergence of stable AMOR 
Let 

^TTg ~ J ^ ''^e{x) dx , (3.8) 

S^g = J {x - ^lTre)ix - t^TTeV T^eix) dx , (3.9) 

be the expectation and covariance matrix of ng, respectively. Define the mean field /i : 8 — > 
M'* X Sd by 

hiO)^ {h^{0),h^{0)), (3.10) 

where 

hT,{0) = S^, - S + (/i^, - /i)(^^, - ^)^ 
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The key ingredient for the proof of the convergence of the sequence {Ot)t>o is the existence of a 
Lyapunov function w for the mean field h: we prove in the Appendix (see Lemma 5.6) that the 
function w : 8 — > IR+, defined by 

= -J logAA(a;|0)7r,(x) + | |^ - P)j:-^ ^\\^ ' ^^'^^^ 

is continuously differentiable on Q and satisfies {Vw, h) < 0. In addition, (Vix;(0), h{9)) = iff 
9 is in the set 

£ = {61 e e : h{9) = 0} = {61 e 6 : \'w{9) = 0} . (3.12) 

The convergence of the sequence {9t)t>Q is proved by verifying the sufficient conditions for the 
convergence of the stochastic approximation for Lyapunov stable dynamics given in [1]. The 
first step is to prove that the sequence is bounded with probability one: we prove that, almost 
surely, the number of projections ip is finite so that the projection mechanism (Steps 15 to 17 
in Algorithm 2) never occurs after a (random) finite number of iterations. We then prove the 
convergence of the stable sequence. To achieve that goal, following the same lines as in [1], we 
make the following assumption. 

Assumption 2. Let C be given by (3.12). There exists Mi, > such that C C {9 : w{9) < M^}, 
and w{C) has an empty interior. 

For a; e M'' and A C M'', define d{x, A) = iniaeA \\ X — a\\. The following result is proved in 
the Appendix. 

Theorem 3.2. Let /3 e (1/2, 1] and 7* > 0. Let {9t)t>o be the sequence produced by Algorithm 2 
with "ft ^ 7* when t — !■ +00. Under Assumptions 1 and 2, 

1. Almost surely, there exist M > Q and > such that for any t > t^, , 9t (z {9 Cz Q : w{9) < 
M} . In addition, the number of projections is finite almost surely. 

2. Almost surely, {w{9t))t converges to w* G w{C) and limsup^ d{9t,C.u,*) — > where C^* = 
{9 e £,w(6l) = w*}. 

Theorem 3.2 states the convergence of {9t)t>a to the set £ of the zeros of h; note that this set 
neither depends on the initial values {9o, Xq) nor on other design parameters. In our experiments, 
we always observed pointwise convergence. We now state a strong law of large numbers for the 
samples {Xt)t>o, which holds for all paths such that {9t)t converges to a point 9* G C. 

Theorem 3.3. Let (5 E (1/2, 1], 7^ > 0, and 9* E L. Let {Xt,9t)t>o be the sequence generated 
by Algorithm 2 with jt ^ 7* i when t — > +00. Under Assumptions 1 and 2, on the set 
{limj 9t — 9*}, almost surely, 

1 

°° t=i 

for any bounded function f. 

Finally, Theorem 3.4 yields the ergodicity of AMOR. 
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Theorem 3.4. Let jS e (1/2, 
by Algorithm 2 with 7t ^ 7* t' 



1], 7* > 0, 
when t - 



and 9* 
-> +00. 



e C. Let {Xt,dt)t>o be the sequence generated 
Under Assumptions 1 and 2, 



lim 



sup 

l/l|oc.<l 



E[/(^t)l 



liin„ 6a— 0* 



-7re*(/) P(lini( 
9 



= 



The expression (3.11) of w provides insight into the hnks between relabehng and vector 
quantization [13]. The first term is similar to a distortion measure in vector quantization as noted 
in [5]. It can also be seen as the cross-entropy between ng and a Gaussian with parameters 9. The 
second term in (3.11) is similar to a barrier penalty in continuous optimization [7]. From this 
perspective, Algorithm 2 can be seen as a constrained optimization procedure that minimizes the 
cross-entropy. In that sense, if 9* denotes a solution to this optimization problem, the relabeled 
target TTg* oc Iv^*^ is the restriction of tt to one of its symmetric modes Vg* that looks as 
Gaussian as possible among all such restrictions. 

Vector quantization algorithms have already been investigated using stochastic approximation 
tools [21]. However, stability was guaranteed in previous work by making strong assumptions 
on the trajectories of the process {9t)t>0: such as in [21, Theorem 32], see also [21, Results 33 
to 37 & Remark 38]. These assumptions ensure that {9t) stays asymptotically away from sets 
where the function used elsewhere as a Lyapunov function is not differentiable. In this paper, 
we adopt a different strategy by introducing the modifications of the stable AMOR algorithm 
and adding a barrier term in the definition of our Lyapunov function (3.11) that penalizes these 
sets. One of the contributions of this paper is to show that this penalization strategy leads to a 
stable algorithm, without requiring any strong assumption on {9t). 



4. Conclusion 



We illustrated AMOR, an adaptive Metropolis algorithm with online relabeling that we previ- 
ously proposed in [5] , and proved a strong law of large numbers for a stable version of AMOR. 
Our algorithm adapts both its proposal and its target on the fly, which makes it a turn-key 
algorithm. Our results lead to a sound characterization of the target of AMOR that does not 
depend on the initialization of the algorithm nor on the user. This is the first theoretical analysis 
of an online relabeling algorithm to our knowledge. The proof further shows how relabeling is 
related to vector quantization. Unlike previous work on stochastic approximation schemes for 
vector quantization, we make no strong assumptions on the trajectories of the process consid- 
ered, rather, we ensure that the appropriate constraint is satisfied by introducing penalization 
directly into the stochastic approximation framework. 

We now examine possible directions for future work. First, following our analysis in Section 3, 
the question of the control of the convergence of AMOR arises, and proving a central limit the- 
orem would be a natural next step. Second, the online nature of AMOR makes it cheaper than 
its post-processing counterpart, but it still requires to sweep over all elements oiV at each itera- 
tion. This is prohibitive in problems with large \'P\, such as additive models with a large number 
of components. In future work, we will concentrate on algorithmic modifications to reduce this 
cost, potentially inspired by probabilistic relabeling algorithms [17, 31], while conserving our 
theoretical results. Third, we are interested in extending AMOR to trans-dimensional problems, 
such as mixtures with an unknown number of components. Reversible jump MCMC (RJMCMC; 
[14]) also suffers from label-switching and inferential difficulties. We will study algorithms that 
combine RJMCMC and AMOR. 
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5. Appendix: proofs 

Throughout the proof, let A^r > be such that 

a; e X ^ ||a;|| < . (5.1) 
For any function f : D ^ M., we will denote by ||/||oo — sup^^^ 

5.1. Preliminary results 

We restate (with a slight adaptation) Lemma 1 of the supplementary material from [5] that we 
will use extensively. 

Lemma 5.1. Let 9 E 0. 

1. The sets {PVe^P G V} cover X, and for any P,Q £ V such that P ^ Q, the Lehesgue 
measure of PVe D QVg is zero. 

2. Let X be a measure on (X, A") with a density w.r.t. the Lehesgue measure. Furthermore, let 
A be such that for any X and V eV, X{PA) = X{A). Then X{Vb) = X{X)/\r\. 

Proof. (1) Let 9 £ Q. We first prove that for any P,Q E V and P ^ Q, the Lebesgue measure 
of PVe n QVe is zero. Observe that PVe f] QVe C {x : Le[P^ x) = Le(Q^x)} and Le{P^x) = 
Le{Q'^x) iff 

{x - PfifPi:-^P'^{x - P/i) ^{x~ QtifQY.-^Q'^ix - Qfi) , 

or, equivalently, 

x^ (PS-ip^ - QE^iQ^) X - 2fi^ (E-ip'^ - E-^Q^) x = . 

Then {x : Le{P'^x) — Le{Q'^x)} is either a quadratic or a linear hypersurface, and thus of 
Lebesgue measure zero, except if both = R^Y.~^R and E^^/i — RT,~^fj, with R = Q^P. 
Since 7^ is a group, R £ V and the definition (3.1) of 8 now guarantees that these two conditions 
never simultaneously hold when 9 £ Q. 

We now prove that X C [Jp^-p PVe. For any x G X, there exists P £ V such that Le{Px) = 
minQg-p Le{Qx). Then x G P^Ve and this concludes the proof since is a group. 
(2) Let 9 £Q. Using item (1), it holds that 

A(X) ^ f dX^^ I dX=^ f dX^\V\ I dX. 

□ 

5.2. Differentiating the cross-entropy term in (3.11) 

Now, for 61 e e, let 

w{9) ^- j \ogN{x\9)T:B{x) dx . (5.2) 

Anticipating that we will need to differentiate the function w defined in (3.11), of which w is 
the first term, we state and prove three lemmas and a proposition that yield the gradient of 
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w. Lemma 5.2 explicitely reformulates w as a distortion measure in vector quantization [13]. 
Lemma 5.3 gives the gradient of a distortion measure for generic foss functions Lg and a generic 
open set &. Its proof is adapted from [13, Lemma 4.10, page 44]. We then show in Lemma 5.4 
that Lemma 5.3 applies to the loss function given by (2.1) and the set 6 given by (3.1). Finally, 
Proposition 5.5 gives an expression of the gradient of w. 

Lemma 5.2. For any 9 E Q, 

w{0) ^ -\ndet{j:) + - / mm L(^pf^^p^pT-^{x) tt{x) dx . 
Proof. Let G O. By definition of w and by Lemma 5.1, 

w{e) = I Indct(E) + ^ / Lg{x)TT{x) dx , 

2 2 Jyg 

where Vg and Lg are given respectively by (2.2) and (2.1). Upon noting that tt is invariant under 
the action of 7^, we compute 

\V\ / Lg{x)Tr{x) dx ~ / Lg{x)iT{x) dx — / Lg{P"^ x)Tr(x) dx . 
JVe p^p-'Ve p^pJPVe 

In addition, by the definition (2.2) of Vg, 

PVg^{xeX : Lg(P^x) = min Lg(Qx)} . 

Qev 

Then by Lemma 5.1, 

\V\ / Lg(x)Ti{x) dx = > / mm Lg{Qx)TT{x) dx = I mm Lg(Qx)'K(x) dx . 

Jve ^pJpVeQ^^ J Qev ' ' ' ' 

Finally, by the definition (2.1) of Lg, Lg{Qx) — L(^qt ^ qt-^q^^x), and this concludes the proof. 

□ 

Lemma 5.3. Let Q be an open subset ofM.^, r be a positive integer and O C G*" 6e an open set. 
Let X C be a measurable set and tt be a probability density w.r.t. the Lebesgue measure on 
X . Let {Lg, 9 G 0} be a family of loss functions : A" — > M, satisfying 

A. For TT-almost every x, 9 ^ Lg{x) is on Q and for any 9 £ Q, there exists Hq > such 
that ^ 

sup -r-r-r-lh^ \7 gLg{x)\ T:{x)dx < oo . 

\\h\\<fia 

B. For any 9 £ Q, there exists Hq > such that 

\Lg+hix) -Lg{x)\ , ^ 

sup TT^X) dx < OO . 

,\h\\<ho iWl 

C. For any = {9i,...,9r) <E O, the sets 

Vg. ^ {x e X : Lg^{x) < minjLg^{x)} 
are measurable, cover X and for any i ^ j , the Lebesgue measure of Vg. HVg. is zero. 
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For 6 — {9i, - ■ ■ ,0r) & O define the function (p : by 

= / min Lg.(x) Tr(x)dx . 

J l<i<r 

Then if is differentiable on O and for 1 < i < r, 

Ve.(p(6»)=/ Ve,Le.(x)7r(x)dx . 

JVg^ 

Proof. Let 6/ = (6*1, • • • , Or) G O. Set 



d{x,6) = min Lg.{x) . 

l<i<r ^ 

By definition of the function ip 

tp{9 + h) - tp{9) = J {d{x, e + h)~ d{x, 6)) tt{x) dx . (5.3) 

We now prove that 

/ T r, 

g^{x),hi) tt{x) dx 



hm \\h\\-^ (^{e + h) - ip{e) -j^f (^oM 

by applying the dominated convergence theorem. First, by Assumption C, 
^(0 + h)-^(0) - V / {VeMx),h,)Ti{x)dx 

= V / {d{x,e + \\) - d{x,e) ~ {\Ig.Lg\x),K))-K{x)dx . 

Now set 

Vg° = {x ^ X : Lg^{x) < ininj^iLe^ix)} 
and note that Vg. \ Vg. has measure zero under Assumption C. Then 



(^(6> + h) - (^(0) - V / {We^Lg^{x),h,)7T{x)dx 

= V / {d{x,e + h) ~ d{x,e) - {Wg.Le.{x),h^))Tr{x)dx . 

Let X € Vg.] under Assumption A, 9 Lg{x) is continuous on and there exists Ex such that 

< ^ d{x, + h) = Lg^+h, (x) . 

Then, by Assumption A, 

d{x,6 + h) - d{x,9) - {\Jg^Lg.{x), hi) = Lg^+hM) - Lg.{x) - {Wg^Lg^{x), h^) 

= C{0i,x,hi) 
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with \\hi\\^^C{9i,x,hi) — > when \\hi\\ — > 0. Hence, we proved that for any i < r and any 
Km \\h\\-^ (d{x,e + h) ~ d{x,e) - {Ve,Lg^{x),h,)) = 0. 
We now prove that there exists ho such that 

/ sup \\h\\^^\d{x,e + h.) - d{x,e) -y2{y 9, L0.{x), hi) {x)\TT{x)dx < +00 . (5.4) 

J \\h\\<ho 

First remark that for ah z, a = (ai, • • • , a,.), b = (6i, • • • , br), 

|d(2;,a + b) - fi(z,a)| < max |La.+fc^(z) - La.(z)| . (5.5) 

l<i<r 

Indeed, assume without loss of generahty that d(z,a) < d{z,a + b) and let i be such that 
d{z, a) = La - (z), then by definition of the distance d, d{z, a + b) < La-+bi (z), which proves Eq. 
(5.5). Now, the proof of (5.4) is a consequence of Assumptions A and B and the inequality 

r 

max |£a.+fc, (z) ~ La^ (z) | < \La,+b, (z) - Lai (2) I ■ 

l<z<r ^ — ^ 
i=l 

□ 

Lemma 5.4. Under Assumption 1, the quadratic loss Junction given by (2.1), the set Q given 
by (3.1), and the open set 

o = {{Pfi, pi:p^) -.PeV, (m, s) e e} 

satisfy the assumptions of Lemma 5. 3. 

Proof. When taking derivatives with respect to a matrix, we shall use the "vec" notation during 
computations. For a d x d matrix A, its vectorized form vec(^) is a d^ vector such that vec(A) 
stacks the columns of A on top of one another. In general, we refer to [8] for matrix algebra 
notions. 

We check the conditions of Lemma 5.3. Denote by r the cardinality of V and set V = 
{Id,P2r ' ' jPr), where Id is the d x d identity matrix. We set 

O = m, ■■■ ,0r)e&'- -.0^^ (P,M,P,I]Pf ), > 1} . 

Note that for 9 eO, Lg^{;x) = Lg^{P'[x) and Vg^ = PrVg^. Now, we have 

(Af,S) ^ (x-Ai)^S-i(a;-^) = -^(x - Ai)^Adjugate(S)(x - u) 

det 2j 

so that 6 I— > Lg(x) is a rational function in the coefficients of /i and S whose denominator 
detS > 0. In addition, 

^^P JiTu \h^^eLg{x)\ < \\WeLe{x)\\ < ||V^Le(a;)|| + \\W^Lg{x)\\ . 

\\h\\<ho ll"l 
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The RHS is at most quadratic in x (for fixed 6). By Assumption 1, the RHS is 7r-integrable. 
This proves Assumption A of Lemma 5.3. 

We now prove Assumption B of Lemma 5.3. Let 6 G Q and set A9 = (A/i, AS). By standard 
algebra, we have 

(E + AE)"^ = - AE + o(|| AE||) 
for any matrix AS sucli that S + AS is invertible. Therefore, 

Le+A0{x) - Lg(x) = ~2{Afifj:-\x - n) - {x - fif^'^ AS J:^\x - /i) + 9, AO) , 

for some function 9, A9) such that 

\E{x,9,A9)\<C{9)M^\m\' 

and some constant C{9) (depending upon 9 but independent of a; and A9). The proof is concluded 
since, by Assumption 1, J \\x\\'^tt{x) dx < +oo. 

Finally, the sets Vg. are measurable for any 0i, • • • ,9r £ Q since {x, 9) ^ Lg{x) is continuous 
OTi X X Q. The proof of Assumption C of Lemma 5.3 is then concluded by application of 
Lemma 5.1. □ 

We are now ready to state the final result of this preliminary section, and give the expression 
of the gradient of w defined in (5.2). 

Proposition 5.5. Under Assumption 1, the Junction w defined in (5.2) is continuously differ- 
entiable on O and for any 9 € Q, 

Wf,w{9) = -S"^(^^g-^), 

Vj:w{9) = -^S-i(S,, -S + (m,, -M)(Ai^, -Ai)^)S-i. 

Proof. Let r denote the cardinality of V and set V — {Id, ■ ■ ■ ; Pr)- Let 9 E Q. By Lemma 5.2, 
we have 

1. , ,„s 1 



w{9) — -lndet(S) + - min Lg.{x) 'n{x)dx 

2 2 / l<i<r 



where 9, = {P,tJL,Pj:~^Pl). 

We first consider the derivative w.r.t. jjt. We have 



\7 w{9) = iv„ / min Lg.{x) Tr{x)dx. 

Z J l<i<r 

By Lemmas 5.3 and 5.4 and the chain rule, we have 

i=l •' 



i=l 

where 



Ai = {x : Lg.{x) < min Lg^ (x)} = PiVg , 
3 
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with Ve defined in (2.2). Hence, by Lemma 5.1, and since tt is invariant under the action of V, 
we have 



Vf,w{9) = -S^^V / {x-n)TT{x)dx 



= -T,^^ J {x — fi)[rTT{x)lvg{x)]dx 

where we used the definition (3.8) of ^T^g. 

We now consider the derivative w.r.t. E, that we will derive in a similar manner. We refer 
to [8] for matrix algebra notions such as Kronecker products. First remark that, by standard 
algebra and since S is symmetric, 

Vvec(S)lndet S — vec{T.^^) . 

Then recall that 

Vvoc(s)(a; - fJ.)^^^{x - n) = -J:^'^{x - Y.^'^{x - /i) . 



Now let, for A a matrix, = A(^ A. Using Lemmas 5.3 and 5.4 along with the chain rule, 
we compute 

Vvcc(s)S^(e) - ^vec(I]-i) 



1^(^^2)T f ^^^^^^^^ [{x~P,f,f^-\x-Pui)]^,^p,^p^7r{^)dx 
[^-\Plx-t,)f\{x)dx 



where we used the identities {A ® Bf ^ ® and {A ® B){C L») = {AC) (BD). A 
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change of variables now leads to 
Vvcc(s)w(0) - ^vec(S-i) 



2 —,JVe 

where we used the distributivity of the Kronecker product, Lemma 5.1, and the definitions (3.8) 
and (3.9) of /i^^ and S^g. Finally, the identity vec{AX B) = {B^ (g) A)vec{X) allows us to write 

Vvec(s)w(6') = -^vec(E"^ [S^, - S + (mtt^ - ^J.){^J.■^e - m)^] ^'^) ■ 

□ 

5.3. The Lyapunov function 

Lemma 5.6 establishes the existence of a Lyapunov function for the mean field h given by (3.10). 

Lemma 5.6. Under Assumption 1, the mean field h is continuous on O. the function w defined 
by (3.11) is on Q and 

1. Vf,w{9) = -T.-^hf,{0) and Vj:w{9) = -iS-i/is(6')S-i. 

2. {Vw{0),h{0)) <0 one and {Vw{e),h{0)) = iff 6 e C. 

3. For any M > 0, the level set 

Wm = {0&Q- w{9) < M} (5.6) 
is a compact subset of Q, and there exist (5i,52 > such that 

inf inf 11(7 - P)I]"^i|| > (5i and (5.7a) 

inf A,ni„(S) > <52 , (5.7b) 

where Amin(S) denotes the minimal eigenvalue of the real symmetric matrix E. 

Remark 5.7. As a consequence of Lemma 5.6, observe that for any M > 0, there exists S > 
such that Wm Q where JCg is defined in (3.3). 

Proof. (Continuity of h) Since (/ — P)E"^/i 7^ on for any P ^ V* , it suffices to show that 
9 I— !■ fi^g and 9 h- !■ Stt^ are continuous. Since, by Lemma 5.1, the boundary of Vg is of Lebesgue 
measure zero, the continuity of i— )■ ^^^^ follows from Lebesgue's dominated convergence theorem 
if, for any x S X \ dVg, 9 1-^ xlyg (x) is continuous. To see this, note that if x is in the interior of 
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Ve, then there exists a neighborhood V of 6* such that for any 9' E V , x E Vq', and if a; G X \ Ve, 
which is an open subset of X, then there exists a neighborhood V of 6* such that for any 9' e V, 
X eX\Vg>. 

The case of i— > is similar and omitted. 

(w is on 8 j It is shown in [5, Proposition 3 of the supplementary material] that the first 
term in the RHS of (3.11) is continuously differentiable on Q. Since ||(/— ^ for any 

P E V* and {fi, E) E G, the second term in the RHS of (3.11) is continuously differentiable on 
8. By [5, Proposition 3 of the supplementary material], it holds for any 6* = (/i. E) e 8 that 

= ~E-i(/i., - m) + «E ||(/ _ P)S]-Vr ^''^^^''^ ^ -^-'h^{9) 
S/^w{9) = -^E-i(E,, -E + (Ai-M.J(M-A*^J^)S-^.. 

' f E ||(j_p|s-1m||4 (MM^S-it/p) E-i + C/pE-Vm^ 
= -iE-i/is(e)E-i . 
Hence, upon noting that h^{9) and E^^ are symmetric, 

{Vw{9),h{9)) = -h^{9f^~^h^{9)-^Tva.ce(j:-'h^i9)^-'h^{9)'^ 

= -h^{9f^-^h^{9) - iTrace(E-i/2;is(^?)E-i;is(^^)E-i/2 

The first term of the RHS is negative since E e Cj" and the second term is negative since 
(A, B) ^ Trace(A^B) is a scalar product. Therefore {Vw{9), h{9)) < with equality iff 6^ G £. 
(Wm is compact) We prove (5.7a). By the definition (3.11) of for any 9 E Wm, we have 

- / logAA(.|^)..(x) dx + f E^ ||(,_p;^-r,||. < M . 

In particular, the first term in the LHS is a cross-entropy, and it is thus non-negative (alterna- 
tively, see [5, Proposition 1 of the supplementary material]). Consequently, for any 9 E Wa/, we 
have 

^ 1 2M 

||(/~P)E-VI|2 - ^ ■ 

This yields ||(/ - P)E-VlP > 2l7 P ^ ^^us concluding the proof of (5.7a). 

We now prove (5.7b). Let 9 — (/i, E) E Wm- Denote by (Aj(E))i<d the eigenvalues of S. Since 
E is symmetric, there exist dx d matrices Qq, Kq such that E — Qe^eQj , Qe is orthogonal, and 
Ag = diag(A,(E)). Then 



2M > 2w{9) > \ogAf{x\9)7Tg{x) dx 

^ dlog(27r)+logdetE + (/i^, -M)^E~i(/i^, -/x)+Trace(E-iE^J (5.8) 

d 

> El°g^«(^) + + Tracc(E-iE^J . 
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Set h{e) = {Qj^^.Qeh. Then 

Trace(S-il],J = Trace(QeA^iQ^S,J - Trace(Q^S,,QeA^i) = ^ ^ . (5.9) 
Therefore, for any 9 G yV_A/, 

^logA,(^) + M^<2M. (5.10) 

We now prove that for any i, infyvM h > 0. This property, combined with (5.10), wiU conclude 
the proof of (5.7b). Let e > be such that 2''e||7r||ooA^-i < \r\ , and for v e {x e R'^ : 
\\x\\ = 1}, let 

BJ(0) = {a:e Supp(^)ny9 : \{x~ti^,,v)\ < e}. (5.11) 
Note that by Assumption 1, 

^(5^(0)) < ||7r|ULeb(SJ(0)) < 2'^e||7r|UAri . 

Then, by definition of e, 

7T{Vg \ BUe)) > \V\ ~ 2'^e||7r||«,A^-i > . (5.12) 
Now, if (ci) denotes the canonical basis of M'', then 

bi{0) = \V\efQ]) (^J {x - fi^g){x - fi^g)'^Tr{x) dx^ QeCi 

= \V\ {Qeei)'^{x - id.^g){x - fi.^g)'^Qgei tt{x) dx 
Jve 

= I'PI / {x - ^„g,Qge,)'^TT{x) dx 

JVg 



> \V\ {x - fj.^i,,Q0ei)^TT{x) dx 

JVe\B7''''-(e) 

> e'\V\n{Ve\BQ«^'{e)) , (5.13) 

where the last inequality follows from the definition (5.11) of B'^'^'^^[9). Thus, by (5.12), bi{9) is 
bounded away from zero on Wm- 

As w is continuous on 0, {6 E Q, w{9) < Af } is closed. From (5.7b), (5.8) and Assumption 1, 
A* {fJ-ire ~ I^Y''^~^{lJ'TTe ~ A*) bounded on Wm- In addition, (5.8), (5.9) and (5.13) imply that 
E logdetS is bounded on Wm- These properties combined with (5.7b) imply that Wm is 
bounded. Hence Wm is compact. 

□ 

5.4- Proof of Proposition 3.1 

(1) By the definition (3.1) of 8 and Lemma 5.1, V6' G 6,.x G X, it holds that 
/ <le{x,y) dy ^ ^ M{Py\x,cS) dy = I . 
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(2) Let {Xt)t>o and {9t)t>o be the random processes defined by Algorithm 2. We prove that for 
any measurable positive function /, 

E[f{Xt)\Xo,eo,...,Xt-i,et^i] - J f{xt)Pe,_AXt-uXt) dxt ,w.p.l. 

Let / be measurable and positive. Let {P, X) be the r.v. defined by Steps 5 and 6. Let t/ be a 
uniform r.v. independent of a{XQ,9o, ■ ■ ■ , Xt~i,9t-i, P, X). By construction, it holds that 

E[f{Xt)\Xo, 00, ... , Xt-i, ^t-i] = E[/(PX)ly<„^^_^(^^_^_p^)|Xo, 00, ... , Xt-i, 0t-i] 

+ E[/(X,_i)lc/>a.,_,(x._,.px)l^o,0o,...,^t-i,0t-i] , (5.14) 

where ag{x, y) is given by (3.5). Since U is independent of the past and from P and X, we have 

E[/(PX)l[/<„^^_jXt_i,Px)l^o,0o, . . . 6*4-1] 

f{PX)(\-ae,_AXt-l,PX))\X^,eo,...,Xt^^,et^ , (5.15) 



= E 
and 

E[/(^t-i)ly>„^^_jj(-^_^ px)|^o,0o, . . . ,Xt-i,9t-i\ 

= /(Xt_i) E[(l-ae,_i(^*-i,i^^)) |^o,0o,...,^t-i,0t-i] . (5.16) 

Now note that the projection mechanism (Steps 15 to 17 of Algorithm 2) guarantees that 
9t-i G O with probability 1. By Lemma 5.1, 6* e implies X ~ Up(PVe) and 

yP.QeV such that P^Q, LehiPVg n QVg) = 0. 

Thus, for any measurable and bounded function 95 : X x — M, we have 

/ ip{x,9)dx— / (p{x,9)dx. 
Applying this decomposition to (5.15) yields 

^[fiPX)li,<ae^_^(Xt-i^PX)\Xo,9Q, . . .,Xt-i,9t-i] 

= E/ fe(-P^) lv.,_, (Px)AA(x|Xt-i, cSt-i) dx 

= E / hiPx)—^-—lv,^^{Px)U{x\Xt^,,cEt-i)dx, 

where N{x,9) = \{Q G V/Qx e Vg}\. Using Lemma 5.1 again, 

9ee,xi Up^QiPVg n QVg) ^ N{x, 9) = 1, 
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and thus 

HfiP^)^U<ae,_^{Xt-i,PX)\^0,do, ■ ■ . ,Xt_i,0t_i] 

= E / HPx)lve,_SP^)^i^\^t-l^''^t~l) dx 

= E/ h(y)ive,_,{yW{p-'y\Xt-i,cJ:t.i) dy 

h{y)(let-i{Xt-i,y) dy , 



Pev • 



where in the last step we used the fact that T' is a group. Similarly, 

E 



((1 - a9t-i(^t-i:2/)) <let^AXt-i,y) dy; 

and this concludes the proof. 

(3) Let 6* e 9. Eqn. (3.4) implies that if x € Vg, then P{x, Ve) = 1. To prove that ngPg = ng, 
it is sufficient to check the detailed balance condition, which states that 



VA, _B C X measurable, / ng{x)Pe{x, B) dx — / TTg{y)Pg{y, A) 
J A J B 



dy 



We consider the two summands in the definition (3.4) separately. First, it holds that 

T:g{x)ae{x,y)qg{x,y)lve{y) = \P\{'K{x)qe{x,y) A ■K{y)qe{y,x))lvg{x)lve{y) 

= TTe{y)oiB{y,x)qe{y,x)lve{x) , 

so 

T^e{x)[ / aeix, y)qe{x,y)dy] dx ^ T^eiy) { / aeiy,x)qe{y,x)dx] dy 

A \JBnVe J JB yJAnVe J 



Secondly, 



'Kb(x)\b{x) \ (l — ag{x, z)) qg{x, z) dz dx 

A Jve 



TTe{x) / {1 — ag{x, z)) qg{x, z) dz dx 

AnB Jve 

7re(y)lA(y) / {'^- ag{y,z))qg{y,z)dz . 

B JVg 



This concludes the proof of the detailed balance condition. 
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5.5. Regularity in of the Poisson solution 
Lemma 5.8. 

1. For any M > 0, there exists p G (0, 1) such that for any x G X and any G Wm; 
||P,"(x,.)-7re||TV <2(l-p)". 

2. Under Assumption 1, for any 6 d Q, there exists a solution Hg of the Poisson equation 
g — Peg = H(-, 9) — -KgH^-, 9). Furthermore, for any M > 0, 

sup s\yp\He{x)\ < oo . (5-17) 

Proof, (of Item i j It is sufBcient to prove that there exists p G (0, 1) such that for any x G X 
and 9 G Wm, Peix,-) > pne (see e.g. [20, Theorem 16.2.4]). By (3.4), for any a; G X and 
A £ X, Pe{x,A) > /^py-g 0!g{x,y)qe{x,y) dy. By Lemma 5.6, there exists a > such that for 
any (/^, S) G Wm: any m, z G X, and any P € V, we have Af{Pz\m,,'E) > a. Thus, for any 
9 G Wm and y eVe, it holds that 

aeix,y)qe{x,y)lvAy)>a\V\ ( 1 A IvAv) > jr^Mv) ■ (5-18) 



t:{x) J " \\-k\\ 

Thus, we have Pe(x, ■) > png for any x G X and 9 G Wm with p — a/\\Tr\ 
(Proof of Item 2) 



J2Pe{Hix,e)^MH{;9))) 



< sup \\H{;9)\\ooy2\\P^ix,-)-7rg\\TV 

< 2 sup \\Hi-,9)\\^p-\ (5.19) 

eeWM 



Since the sup is finite by Lemma 5.6, the series ^ Pg(^H{x, 9) — TTg{II{-, 0))) converges. Finally, 
note that 

He{x) = E Pe{Hix, 0) - 7rg{H{;9))) 

n 

is a solution of the Poisson equation, and that supggyy^^^ ^.^^ l^e(2;)| < oo. □ 

Lemma 5.9. Let Af > and k G (0, 1/2). Under Assumption 1, there exists C > such that 
for any 9 G Wm 9' G Q, it holds that 

Leb{Ve \ Vg') < C\\9 - 9'\\^-^'' , (5.20) 

where Leb{A) denotes the Lebesgue measure of the set A. 

Proof. We prove that there exist C,h > 0, such that for any 9 G Wm and any 9' £ Q such that 
\\9 - 9'\\ < h, LehiVg \ Ve>) < C\\9 - 9'\\^-^''. Note that since C X and since X is bounded, 
there exists C > such that Leb(ye \Vg>) < C. Therefore, (5.20) holds with C = C V C/h^^"^". 

By Lemma 5.6, w is uniformly continuous on Wm+i, and there exists Hq > small enough 
for which 

[9 G Wm, 61' g e, lie* - 9'\\ < ho] ^ Vit G [0, 1], 9 + u{9' - 9) e Wm+i ■ (5.21) 
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Let h < ho. Let 9 = (/i, S) £ Wm and 0' ^ 9 such that ||6l - e*']] < h. 

By definition of the set V^, for any x G Ve \ V^', there exists P £ V* such that L0'{x) — 
L0'{P^x) > and Lg{x) — Lg{P'^x) < 0. Since i9 L^{x) — L^{P'^x) is continuous on Wm+i, 
there exists u G [0, 1] depending on x, 9, 9' , and P such that Lg^^(^g,_g-^{x)—LQ_^_^(^Q,_g-^{P'^x) — 0. 
Therefore 

Ve \Vg,c U Vp, 

where 

Vp= \J Z{Lg+^^e,_g)i-)-Lg+.,^g,^e^{P^-))nX; (5.22) 

tiG[0,l] 

and Z{f) denotes the zeros of the function /. The proof proceeds by showing that for any 
P E V* , Vp is included in a measurable set with measure O {\\9 — 6''||^^^'^). 

Let P e V*. Let B(0,A^) = {y e : \\y\\ < A^}, where is defined by 5.1. For any 
X e B{0, A^), define 

lg{x) = 2fi'^T.-^{I - P^)x , 
qg{x) = x^iY.-^ - PJ:-^P'^)x , 
Bo' - U e B{0, A,) : \lg{x)\ <\\9~ 9'\n . 

Denote by S the unit sphere {x & R'^ / \\x\\ ~ 1}. Let u € [0, 1] and tv e Z{^Lg_^^i,(^g,_g-^{-) — 
Lej^u{e' -e){P'^ ■)) H X where t G [0, A^] and w e S. Upon noting that for any -d S Wm+i, 

Li}{tv) - Lf,{tP'^ v) ^ t{qi){v)t - hiv)) , (5.23) 

we consider several cases: 

(i) tv € Be,e'. 

(ii) ^ Bg^e' and (w) = 0. Then, by (5.23), loj^u(e'-e){'tv) — which implies that 
tv € Bg^gi. This yields a contradiction. 

(in) tv 4- Be, 9' and qg+u{6' -e){v) ^ 0. Then t ^ and, by (5.23), 

^^Wz^^ (5.24) 
9e+«(e'-e)(w) 

Since we assumed t e [0, Att], this ratio is positive. In order to characterize the point tv, 
additional notations are required. First, note that by Lemma 5.6, there exists Ci > such 
that for any 9 = {fi, S) G Wm+i, 

\\9-9\\ <ho^\\±-^ -Ys-^W <Ci||S-I]|| . 

Thus, there exists C2 > such that for any 9 e Wj\/+i, H^* — 6*11 < /lo, and for any 
a:GB(0,A^), 

\lg{x) - lg{x)\ = 2|^^[E-i-E-i](/-P^)x + (/i-/x)^E-i(/-P^)a 

< C2||^-6i||. (5.25) 
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Note that since G S(0, A^), C2 does not depend on x and 6. Similarly, there exists 
C3 > such that for x G B{0, A^) and 9 e Wm+i satisfying ||^ - 6*11 < ho, 



\q^ix) - qeix)\ < c^we - e\\ . 

We can assume without loss of generality that h is small enough so that 



<h 



'r-iC2 + 2C3A^)\\9-e'\\ > i| 



We now distinguish three subcases, 
a) 1; e Be,e'- 

h) V ^ Bg^g' and qe{v) ^ 0. Since t G [0,A^], (5.24) implies that 

\lg+u(g'-e)iv)\/A^. Since v ^ Bgy, \lg{v)\ > ^'f and by using (5.25), 



(5.26) 



(5.27) 



(^)l > 



\le+u(e'~e)\ > \le{v)\ - \lg+^ 



ig(v) >\\e 



C2 



Hence, it holds that \qg+^^e,_e){v)\ > {\\e - e'l]"" - C2\\0 - 6l'||)/A^, and, by (5.26), we 
have 199(^^)1 > \'ld+u(e'-e){v)\ — C^WO — 9'\\. These inequalities together with (5.25) and 
(5.27) lead to 



t 



le{v) 



k+u{e'-e){v) lg{v) 



g+u(e'~e){v) qg{v) 



for some C4 > 0. 
c) V ^ Bg^g, and qg{v) = 0. Then by (5.25) and (5.26), 

^^\\0-o'r-C2\\e-9'\\ 



> 2A, 



which is in contradiction with the assumption that t < A^. 

As a conclusion, we have just proved that Vp is included in the union of three sets defined 
by Bg^gi (case i), by {tv : t G [0, A^r], w G S n Bg^gi} (case iiia), and by 



tv:ven,v(^ Bg^e,,qgiv) 7^ 0, < t < A^, 



leiv) 



< d 



nni 1-2k 



(case iiic). This concludes the first step. 

The second step consists in computing an upper bound for the Lebesgue measure of each 
of these three sets. For simplifying the presentation, we detail the case d = 2 and use polar 
coordinates {p,(l))', the argument remains valid when d > 2 using generalized spherical coor- 
dinates. Define tg^fj)) = lg{e^'^)/qg{e'^'^). Rephrasing the conclusion of the first step, we have 

C ULi with 



V 



(2) 
P 

;(3) 



{(p,0)/pG[O,A,],e^^GBe,e'} 



Vp" = {{p, 0) /e^^ ^ Bg,g,,qg{e^^) ^ 0, < p < A^, |p - tg{^)\ < C40 - 9'^-'^} . 
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These sets are Borel sets. By definition of Wm, Iq is not identically zero and thus 

(2) 

for some C5 > as a consequence of Lemma 5.6. For Vp , note that it is upper bounded by the 
reunion of the two circular sectors in bold lines in Figure 7. This area is easily bounded by the 
area of the outer rectangle, which is proportional to \\6 — d'\\^~'^'^. Finally, 



Leb(V 



^^^^ - ' — lqe(e'*)#0 d(l> ■ 

ov(te(0)-C4||e-e'||i-2'=) 











Jo 


_ 2 _ 



We can assume without loss of generality that h is small enough so that 2C4h^ < A^. 
Therefore, we can partition [0, 2tt] — AU BU C, where 

A = [0,27r] / tg{(j))-C4\\9- e'\\^-^''>0 and te{(j))+ 040-6' W^-^"" <A^} , 

6 = {0e [0,27r] / te{(l)) - C^e - e'W^-^'' > and tg{(j)) + - e'W^-^'' > A^} , 
C = {0e [0,27r] / te{(t))- 040- e'll^-^"" <OandO <te{(f))+ 040- e'll^-^"" <A„} . 

This yields 

Leb(vl?)) < 2C4j^temO-O'\\'-^''dcj,+ ^J^(^Al~{tei^)-C40-9'\\'-'-f) dcj> 

^ ^ {tg{4>) + C46-e'\\^-^- f dcj, (5.28) 



c 



2 _ 

< -e'W^^^" , (5.29) 



for some Cg > 0, since on A < tg{(j)) < A^, on {te{(j)) - C40 - e'll^-^"")^ > {A^-2C4e- 
6l'||i-2'=)2, and on C, |ie(<^)| < C4e - 6''||i-2«. 

This concludes the proof. □ 

Lemma 5.10. (Regularity in of the invariant distribution ng) 

Let M > and k G (0, 1/2). Under Assumption 1, there exists C > such that for any 9 € V^m 
and 9' G 6, 

ho -7rg,\\TV<C\\9-9'\\'-^\ 



Proof. By definition of the total variation, 



sup 

ll/l|oo<l 



\V\ sup 

ll/ll=c<l 



f{x)TTg{x) dx~ f{x)TTe'{x) dx 



f{x)Tr{x) dx — 



Ve\Vg, 



f{x)'K{x) dx 



VB,\Ve 



Since 



< \V\{TT{Ve\Vg')+7riV9>\Ve)) . 

Vg, \Vg = Vg\ (Vg H Vg,) , Vg \ Vg, = Vg \ [Vg H Vg,) , 
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it holds that 

TTiVe' \Ve) = ^^- Tr{Ve nVe-) = Tr{Ve \Ve') , 

where we used Lemma 5.1. Then, by Assmnption 1 and Lemma 5.9, there exists C > such 
that for any 9 G Wm and 9' £ 6, 

he - TTe'hv < 2\\n\\^Leh{Ve \Ve>) < C\\9 - 9'\\^~^- . 

□ 

Lemma 5.11. (Regularity in 9 of the kernels Pq ) 

Let il/ > and k G (0, 1/2). Under Assumption 1, there exists C > such that for any 9 e Wm 
and 9' G Wm+i, 

\\Pe{x,-)-Pg,{x,-)\W^<C\\9-9'\\^-^\ 
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\Pef{x)-Pe,f{x)\ < 



f{y){aeix,y)qg{x,y)lvs{y) - ag,{x,y)qg'{x,y)lvg, {y)j dy 



-1/(^)1 



ae'{x,y)qe'{x,y)lvg, (y) ~ ag{x,y)q0{x,y)lvg{y)) dy 



< 2||/||oo J ^ag{x,y)qg{x,y)lvg{y) - ag'{x,y)qg'{x,y)lv,,iy) 

4 

= 2\\f\\^Y.^o,e'i^)^ 



dy 



(5.30) 



where 

^le'ix) 

and 



Ae(x)nAg,(x) 
f 

TZe(x)mZg,{x) 
f 

Aeix)mZg,{x) 
f 

'R.eix)nAg,(x) 



ag{x,y)qe{x,y)lv,{y) - ag,{x,y)qg,{x,y)lv,, (y) 
ae{x,y)qg{x,y)lvg{y) - ag'{x,y)qg'{x,y)lv,, (y) 
ae{x,y)qg(x,y)lv(,{y) - ag'{x,y)qg'{x,y)lvg, (y) 
ag{x,y)qg{x,y)lve{y) - aB'ix,y)qg'{x,y)lvg, (y) 



dy ; 

dy 
dy 
dy 



Ag{x) = {y : ag{x,y) = 1} , Tlgix) = {y : ag{x,y) < 1} 
We now upper bound each term. 



Ai n,(x) 



Ag{x)nAg,{x) 



< 



J2 (ly. {yWiQy\x, S) - ly,, iy)Af{Qy\x, S'; 

Q<£V 



dy 



Qev 



^vs'iy) E WiQy\x,^)-Af{Qy\x.,^')\dy 

Qev 



(5.31) 



By Lemma 5.6, there exist a,b > such that for any 9 S Wa/+i, m, z g X, and Q € V, we have 

a<U{Qz\m,cT.) <b , (5.32) 
so that the first term in the RHS of (5.31) is bounded by 

/'|lv,(2/)-ly,,(2/)| ^ AA(Q2/|a:,S)dy < \r\b f \lv,{y) - lvg,{y)\ dy 
J Qg.p J 



= I'Plb J {lv,\v„{y) + iv„\Vo{y)) dy 
< C\\9-9'\\^-^'' , 
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where we used Lemma 5.9. Let us now consider the second term of the right-hand side of (5.31). 
Using the uniform continuity of w on Wm+i (see Lemma 5.6), there exists h small enough such 
that 

9eWM,\\h\\<h^9 + heWM+i- (5.33) 
For any 9 e Wm, 9' e Wai+i such that ||0 - > h, there exists Ci such that 

\mQy\x,^) - ^f{Qy\x,^')\dy < Ci\\9 - 9'\\'-^^ . 

Qev 

Assume now that 9 G Wm, 9' G Wm+i and \\9 - 9'\\ < h. Denote by 

Et = (l-t)S + <S' . (5.34) 
By (5.33) and (5.7b), T,^^ exists and supt<]^ ggyy^^ g/gyy^^^^^ ll^^r^ll < We can then write 



|AA(Q2;|x,E)-AA(Qy|x,E')| = / f^iQy\x,^t) 



-\ogM{Qy\x,J:t) 



dt 



< b 



dt 



logU{Qy\x,J:t) 



dt . 



(5.35) 



In addition, by Assumption 1, there exists C2 such that 
d 



dt 



\ogM[Qy\x,^t) 



[x - Qyf^^\^' - m7\x - Qy) < C2\\9 -9'\\. (5.36) 



We thus have proved that 

[9eWM ,9' eWm+i ,\\9-9'\\ <h]=^ \Af{Qy\x,^)-Af{Qy\xX)\ < C\\9 - 9'\\ . 

Therefore, it is established that HA^ g,||oo < C\\9 - ^'H^-^k, 

Let us consider the second term Ag g,{x) in the RHS of (5.30). Note first that if a; G X and 
y € 7^e(a;) n7^e'(a;), then by (5.32), 7r(y)/7r(x) < b/a, so 



(ly. [yWiQAy. s) - iv„ {y)M{Qx\y, s'; 



Qe-p 



dy 



< 



J2 (1 v« iyWiQx\y, S) - lv„ iyWiQx\y, E') 
Qev 



dy 



Therefore, repeating the above discussion for the bound of Ag^, (x), it is established that 
\\Alg,\U<C\\9-9'r-'^. 

To deal with Ag g, {x) , first observe that there exists C > such that for any 9 e Wm , 
9' e yVj\/+i, and x,y eX, we have 



9{y,x) qe'{y,x) 



qeix^y) qe>{x,y) 



<C\\9^9'\\ 



(5.37) 



R. Bardenet et al. /Adaptive MCMC with online relabeling 



35 



because of (3.6), (5.32), and the above discussion for the upper bound of Alg,{x). Now let 
y G Ae{x) n TZg'{x), then we have 

T^{y)q0'{y,x) T:{y)qe{y,x) 
TT(x)qe'{x,y) 

which, combined with (5.37), yields 



1-C 



7r(.T) 



< 



Thus, 



Aa(x)r\nf,,{x) 



,{x,y)lvf,{y) 



Ti{x)qe{x,y) 
Tr{x)qg'{x,y) 

T^{y)qe'{y,x) 

■n{x)qe'{x,y) 



< 1 . 



>{x,y)lv„,{y) 



dy 



< 



<e{x,y)lve{y) - q0'{x,y)lv,, {y)\ V 
qe{x,y)lvo{y) - qe'{x,y)lvg, (y) + C 



7r(y) I 

Tr{x) 



'{x,y)ivg, [y) 



dy 



□ 



Therefore, it is established that ||A^ g,||oo < C\\e - e'\\^-'^'^. 

The upper bound of Ag g, (x) is similar and thus its proof is omitted. 

Lemma 5.12. (Regularity in of the solution of the Poisson equation) 

Let M > and k e (0, 1/2). Under Assumption 1, there exists C > such that for any 9 G Wa/ 
and 0' G Wm+i, 

\\P9He ~ Pe'Hg,\\^ < C\\9 ~ e'\\'-^\ 



Proof. We recall the following result, proved in [12, Lemma 5.5, page 24]: there exists C > 
such that for any 9 G W^/, 9' G Wm+i, and x &X, 

\\PgHe - Pe'He'Woo < C\\H{-,9) - H{;9')\\oo + C sup |1F(-, 0)|U{lk9 - vreHlxv 

+ snp\\Pe{xr)- Pe'{x,-)\\Tv} ■ (5.38) 

Here supggyy^^ ll^('i ^)l|oo is finite by Lemma 5.6. Now, by Lemma 5.6 again, there exists C > 
such that for any 9 G Wm and 6' G Wm+i, 

\\H{;9)-H{-,9')\\^ < C\\9^9'\\. 

The upper bounds for the two last terms in the RHS of (5.38) result from Lemmas 5.10 and 
5.11, respectively. □ 



5.6. Proof of Theorem 3.2 



We start by proving two lemmas. 
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Lemma 5.13. Let (74)00 be a sequence such that 7? < X^t l7t+i ~ 7*1 < "^^^ 
St 7^*"^ < cxD for some k G (0, 1/2). Denote by Tpt the value of the projection counter at the 
end of iteration t, in Algorithm 2. Let {9t, Xt)t>a be the sequence generated by Algorithm 2. 
Under Assumptions 1 and 2, for any M > 0, 



lim sup 



^L+l \ L+e 

\k=L J k=L 



= w.p.l, (5.39) 



where H, h, w, andWM cire given by (3.2), (3.10), (3.11), and (5.6), respectively. 

Proof. Let M > 0. By uniform continuity of w on Wm+i, let L{M) be large enough so that 



L > L{M),d eWM =^yxeX,9 + lL+iH{x, 6) e Wm+1 
Let L > L{M) and let 



(5.40) 



L+l 

n 

k=L 



For any 9 G Wa/, Lemma 5.8 imphes that there exists a function Hq such that 



Hg-PgHg = H{;e)-7:g{Hi-,9)) and 



sup \\Hg{x)\\ < 00 

xfEX,0eWM 



Therefore, for ^ + _L>i>i>0, we have 

lL.e{H{X,+i,9,) - h{9,)) = lL,,(Af,+i + + R^^,) 

where 

Mi+i = 



R 



(1) 

i+l 



Hg,{X,+i)~PgMXi)^ 
PeiHg.{Xi) — Pg.^-^Hg.^-^{Xi^i) , 
~ Pei+iHg.^-^{Xi^l) — Pg.Hg.{Xi^i) 



First note that 



L+e 



/L+e 

\i=0 



L-1 



(5.41) 
(5.42) 
(5.43) 

(5.44) 



i=0 



By Lemma 5.8, {li^o-^'^i+iji is a martingale- increment. Therefore, by [16], a sufficient condition 
for X]j>o 7j+iIi,o-^'^i+i to converge to zero is 



^7f+iE (\\He,[X^+l) - Pe^Hg,{Xi)f \o) < 00 



(5.45) 



i>0 



By the parallelogram identity and Holder's inequahty, 



\Hg,{X,+^) - Pg^Hg^{X,)f I,,o < 4 sup \\Hg{x)f 

xeJ^.eeWM 
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Eqn. (5.45) then holds smce J2t It < oo- By (5.44), we obtam that 



Um sup 



Let us now consider the term defined in (5.42). Summing by parts, we get 

L+f. L+l 

^7.+ii?!+i = IlL,f 7L+ii'e,^eJ^L)+lL,^ ^ (7.+i-7.)^e.-ffe.(^^) 



L+l 

^ Ik+l^k+l 
k=L 



= w.p. 1 



i=L 



i=L+l 



-^L,nL+l+lP0L+e+i -f^St+f+i {Xl+1+1 



Since sup^.^^ s&Vm 11^0(2^)11 < there exists a constant C such that the RHS is upper bounded 
by C ^|7l+i| + Tliiyt+i \li+i ~ 7i| + |7l+^+i|^ • Under the stated assumptions, this upper bound 



yields 



lim sup 



0, 



with probability 1. 



L.e ^ 7i+ii?-+i 

i=L 

(2) 

Finally, let us consider the term Rl^i defined in (5.43). By (5.40), Lemma 5.12, and since on 

hai 



the event {ipk+i = we have 9k+i = Ok + jk+iH{Xk+i,9k), we obtain 

L+i 



I 



L,l 



i=L 



L+l 

< U,i J2 \\Po^+iHe^+i - Po^He, \ 



i=L 
L+l 



L+i 



□ 



i—L i=L 

This concludes the proof. 

Lemma 5.14. Let M e (0, A/*) and set 

t¥j = {6* e e : Nh < w(e) < M} , t = inf \{Ww(9), h(e))\ . 



Under Assumptions 1 and 2, there exist 5 G (0, i) and A,/3 > such that 

(A) yVM.,0 < 7 < A, IICII u;(m + 7/i(u) + 70 < M, and 

(B) u e r];£, < 7 < A, ll^ll <P^w{u + jh{u) + 7$) < w{u) - 7(5. 

Proof. Define u' ^ u + ^h{u) + 71^. 

(A) Let u e Wm- Since w is continuous on 8 and the level set Wm is a compact subset of 8 
(see Lemma 5.6), there exists 77 > such that for any u S Wm and any u' satisfying — m|| < ry, 
u' e Waz+i- Therefore, since 



< A(max||/i||+/3), 

Wm 



(5.46) 
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there exists Ai,/3i > such that for any < 7 < Ai and any ||^|| < /3i, u' e Wai+i (note that 
maxwJ^J \\h\\ < 00 by Lemma 5.6). 

Since w is continuous on the compact set Wm+i (see Lemma 5.6), it is uniformly continuous 
(u.c.) on Waz+i- Then we can choose A2,/32 > (smaher than Ai,/3i) such that 

VwgWm, ,V7<A2,||eil </32, \w{u)-w{u + -ih{u)+-f£,)\<M-M^. (5.47) 

This concludes the proof of (A). 

(B) Let u S r])| . Following the same lines as in the proof of (5.47), there exist Ai,/3i > 
such that for any < 7 < Ai and ||^|| < /3i, C Wm+i- By Lemma 5.6, this implies that 

w is continuously diffcrentiable on {u,u'). We write 

\{Vw{u).h{u)) - {Vw{u'),h{u)+i)\ = \{Ww{u),h{u)) - {'^w{u'),h{u')) 

+ {Vw{u'),h{u') ~ h{u)-S) \ . 

By Lemma 5.6, ip : u 1—^ {'Vw{u),h{u)) is continuous and negative on the compact set Tjfj , so 
there exists e £ (0, i) such that {\/w{u), h{u)) < —e on T^j^. Furthermore, (p is u.c. on Wm+i, 
and, for any e' > 0, we can thus take /32 and A2 small enough so that for any < 7 < A2 and 
U\\ < /32, Hu) - v{u')\ < e'/2. Therefore 

|(Vw(w),/i(u)) - {Vw{u'),h{u) +0\ < e72+ (||/i(u) - h{u')\\ + /32) max ||Vw|| . 

Wm+1 

Since x ||Vw(a;)|l is continuous on the compact set Wm+i, maxwj,j_|_j ||Vw|| is finite. As h is 
u.c. on Wif+i, one can pick A2, (^2 small enough so that 

yuerli^ ,V7 < A2,||CI| < 132 , and \ {Ww(u), h{u)) ~ {Ww{u'), h{u) + \ < e' • 
Finally, applying Taylor's formula, we get 

w{u')-w{u) = J (Vw{u + t-/{h{u) +^)),"f{h{u) +S.)'^dt 

= "fip{u)+-/J (^{Vw{u + tj{h{u) +^)),h{u) +^) - {Vw{u),h{u))^dt 
< —76 + 76'. 

Since e' is arbitrary, this yields (B). □ 

Proof of Item 1 in Theorem 3.2. Let M > Mi,, let q (depending on M) be such that (see 
Remark 5.7) 

Wm C Wm+2 C ICs^ , (5.48) 

and let 0o € Wm. Let A, /3 be given by Lemma 5.14. By Lemma 5.6, w and h are uniformly 
continuous on Wm+i, and there exists 77 > such that 

X e Wm- \\x - y|| < 77 \w{x) - w{y)\ < 1 and \\h{x) - h{y)\\ < fi . (5.49) 
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By Lemma 5.13, there exists an almost surely finite r.v. N such that w.p.l., 



n>N^"f,Jl+ sup \\H{x,0)\\] < XA'n, and (5.50) 



=N 



e>i 



N+t 



^7,+i(F(X,+i,&O-/i(0.)) 



<ri. (5.51) 



The proof is by contradiction. Denote by ipt the number of projections at the end of iteration 
t. We assume that P(limt tpt = +oo) > 0. We can assume without loss of generality that 

wi9N) <M , ^N>q 
on the set {limt V't — +oo}. Define the sequence (^Ar+fe)fc>o &s 

O'n = On and O'^+k+i = ^Af+fc + 7Af+fe+i^(^JV+fc) ■ 
We prove by induction on k that for any fc > 0, on the set {limj tpt — +oo}, 

The case fc = is trivial since O'j^ = 9^ ^ Wa/ and by using (5.49), (5.50), and (5.48) on the set 
{limt V't = +oo}. Assume this property holds for fc G {0, 1, ...,£}. Then we have 

(^'n+i+i = (^N+e + lN+e+ih{(^'N+e) + iN+i+i {h{(^N+i) - K^'n+li) ■ 

Since ||6'^+f — 6*^+^11 < "q and O'j^j^^ is in Wm, we have \\h{0'^^f^) — h{9N+t)\\ < P- Since 
7]v+£+i < A by (5.50), we can apply Lemma 5.14 to obtain 0'j^^g_^^ € Wm- In addition, 

N+e N+e 
O'n+i+1 - On+i+i = ^ li+i{H{Xi+i,9i) - h{9i))l^^^^=^. + ^ (74+1^(^1) + - ^0) 

i=N i=N 

CN+e \ N+i 

W ^e,eWM+i I X! 7i+i(-ff(-'^j+i: ^i) - h{(^i))^^i+i=^i , 
i=N ) i=N 

where we used the induction assumption in the last equality. From (5.49) and (5.51), this yields 
W+ill < V and w{0N+e+i) <M + 1. Finally by (5.49), Eqs. (5.50) and (5.48) imply 
that on the set {limj 4't — +00} 

^N+e + jN+e+iH{XN+e+i,9N+e) G yVM+2 C /C^jv+f ; 

that is, ipN+i+i — i^N+i- This concludes the induction. 

As a consequence of this induction, we have ipN+e — '4'n for any ^ > on the set {limt V't = 
+00} which is a contradiction. 

Proof of Item 2 in Theorem 3.2. The proof is along the same lines as the proof of Theorem 2.3 
of [1, page 5], and is thus omitted. 
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5.7. Proof of Theorem 3.3 

The proof consists in checking the conditions of [12, Corollary 2.8]. Let / be a measurable 
bounded function. 

By Lemma 5.8, (i) there exists a measurable function fg such that fg — Pgfe = / — T^ef', and 
(ii) for any compact set Wm, there exists L (depending upon M) such that 

y9eWM,xeX, \feix)\<L. 

By Theorem 3.2, P(ilA/) t 1 when M tends to infinity where 

^lM^f]{dteWM}- 

t>0 

Therefore, in order to apply [12, Corollary 2.8], we only have to prove that almost surely, 

^fc^l SUp\\Pg^{x, ■) ~ Pg^_^{x, ■)\\Tyln,t < OO , (5.52) 

\iunrg^{f)lnj,j ^TTe*{f)lnM ■ (5-53) 
By Lemma 5.11, there exists C and k e (0, 1/2) such that 

sup \\Pe,{x, •) - Pe,_, {x, OllxvlfiM < C \\0k - 9k-i\\^-^'' . 

In addition, by Theorem 3.2, there exists a random variable K, almost surely finite, such that 
for any k > K, 

\\0k - Ok-i\\lnM < Ik sup \H{x,0)\. 

This yields 

^fc-i sup||PeJx,.)-Pe._,(x,-)llTvlo., <C ^fc-i7r'% 

k>K k>K 

for some constant C > 0. This concludes the proof of (5.52). The limit (5.53) is a consequence 
of Lemma 5.10. 

5.8. Proof of Theorem 3.4 

Let / be a measurable function such that ||/||oo < 1 and set 

Itif) - \E[f{Xt)lB]~7Tg.{f)¥{B)\ = \E[{f{Xt)-7rg.{f))lB]\ . 

Let e > 0. We prove that there exists such that for all t > T^, sup|-y.j[^j[^<]^} /< (/) < 4e. 
Choose K £ (0, 1/2) and ^ > such that 

Cm^+iS^-^" < e , (5.54) 
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where Mi, and Cm« are defined in Assumption 2 and in Lemma 5.10, respectively. Choose 
such that 

2(l-pAf,+ir <e, (5.55) 

where is defined in Lemma 5.8. By uniform continuity of w on yVM^+2i assume finally S 

is small enough that 



e WM.+ud' e Q,\\e-e'\\ < S =^ \w{9)-w{e')\ < 



1 



(5.56) 



There exists such that for any t >T^, 



- < (5,lim6l, = r ) < e/2 



Hence, for any t>Tl, h{]) < ELi W) + e> where 

/!(/) = \E[{f{Xt) - P^l^JiXt.r,))li\s.^..-e*\\<s] 

Ifif) - \E[{7re,_,.M)-ne*if))l\\e,^^^-g^\\<s]\- 
We first upper bound I^{f). For 9, 9' e 9, let 

D{9,9') ^ snp\\Pe{x,-) - Pe'ix,-)\\TV ■ 
Applying [4, Proposition 1.3.1], it comes for any t > T^, 



<5\ 



(5.57) 
(5.58) 
(5.59) 



/I < E 



< E 



2A ^ D{9f^r,+],0t-rJ'^\\et-r^,-e*\\<s 

r,-l 

2 A ^ (r, - j)D{9t-r,+j,0t^r,+j-l)l\\9, 



9*\\<S 



3 = 1 



where we used that for any q,£ > D{9q+i,9q) < J2j=i ^i^g+j^^g+j-i)- By Proposition 1, 
the random iteration number where the last projection occurs in Algorithm 2 is finite with 
probability one. Let then be such that 2P(r^ > M^) < e/2, so that 



2 A ^ (r, - j)D{9t-r 



Let now > T} V (Af^ + r,) be such that 

i>T2^7t sup \\H{x,9)\\<5 
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Then, by recurrence and using (5.56), we obtain that on {||6't_r^ — ^*|| < S}, Ot-r^+j G Wm»+i 
for all < j < Te. By Lemma 5.11 this yields for any t > 



+j ' 2 



and there exists > such that t>T^ ^ sup{y.|| j||^<i} ll{f) < e. 
We now consider lf{f); it holds 



I? < E 



ITV 11"* 



-,■ -0*||<(5 



By (5.56), ll^t-r, — ^*|| < S ^ dt-r, G Wa/^+i and thus, applying Lemma 5.8 and (5.55) 

sup lUf)<n^-PAU+ir' <e. 

{/:||/lloo<l} 

The derivation of the upper bound of if is similar to that of if, with Lemma 5.8 replaced by 
Lemma 5.10 and uses (5.54). Details are omitted. 
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